Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

نویسندگان

  • Mehmet Koyutürk
  • Ananth Grama
  • Naren Ramakrishnan
چکیده

With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discretevalued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization. We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Techniques for Model Predictive Control of Large-Scale Systems with Continuous-Valued and Discrete-Valued Inputs

We propose computational techniques for model predictive control of large-scale systems with both continuous-valued control inputs and discrete-valued control inputs, which are a class of hybrid systems. In the proposed method, we introduce the notion of virtual control inputs, which are obtained by relaxing discrete-valued control inputs to continuous variables. In online computation, first, w...

متن کامل

Algorithms for Bounded-Error Correlation of High Dimensional Data in Microarray Experiments

The problem of clustering continuous valued data has been well studied in literature. Its application to microarray analysis relies on such algorithms as k-means, dimensionality reduction techniques, and graph-based approaches for building dendrograms of sample data. In contrast, similar problems for discrete-attributed data are relatively unexplored. An instance of analysis of discrete-attribu...

متن کامل

Chemometrics-enhanced Classification of Source Rock Samples Using their Bulk Geochemical Data: Southern Persian Gulf Basin

Chemometric methods can enhance geochemical interpretations, especially when working with large datasets. With this aim, exploratory hierarchical cluster analysis (HCA) and principal component analysis (PCA) methods are used herein to study the bulk pyrolysis parameters of 534 samples from the Persian Gulf basin. These methods are powerful techniques for identifying the patterns of variations i...

متن کامل

Algebraic Multigrid Solvers for Complex-Valued Matrices

In the mathematical modeling of real-life applications, systems of equations with complex coefficients often arise. While many techniques of numerical linear algebra, e.g., Krylovsubspace methods, extend directly to the case of complex-valued matrices, some of the most effective preconditioning techniques and linear solvers are limited to the real-valued case. Here, we consider the extension of...

متن کامل

Recent Developments in Discrete Event Systems

This article is a brief exposure of the process approach to a newly emerging area called "discrete event systems" in control theory and summarizes some of the recent developments in this area. Discrete event systems is an area of research that is developing within the interstices of computer, control and communication sciences. The basic direction of research addresses issues in the analysis an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002